gh-148613: Fix race in `gc_set_threshold` and `gc_get_threshold` by LindaSummer · Pull Request #150356 · python/cpython

LindaSummer · 2026-05-24T16:26:57Z

Issue

Root Cause

In free-threading, the gc_generation.threshold races in threads when one thread has objects triggered the GC.

Lines 2025 to 2031 in cb72193

    
           if (gc->alloc_count >= LOCAL_ALLOC_COUNT_THRESHOLD) { 
        
               // TODO: Use Py_ssize_t for the generation count. 
        
               GCState *gcstate = &tstate->interp->gc; 
        
               _Py_atomic_add_int(&gcstate->young.count, (int)gc->alloc_count); 
        
               gc->alloc_count = 0; 
        
               if (gc_should_collect(gcstate) &&

Inside gc_should_collect we read the gcstate->young.threshold and gcstate->old[0].threshold without thread syncing.

cpython/Python/gc_free_threading.c

Lines 1996 to 2004 in cb72193

    
           gc_should_collect(GCState *gcstate) 
        
           { 
        
               int count = _Py_atomic_load_int_relaxed(&gcstate->young.count); 
        
               int threshold = gcstate->young.threshold; 
        
               int gc_enabled = _Py_atomic_load_int_relaxed(&gcstate->enabled); 
        
               if (count <= threshold || threshold == 0 || !gc_enabled) { 
        
                   return false; 
        
               } 
        
               if (gcstate->old[0].threshold == 0) {

At the same time, the threshold setting also has no syncing protection.

cpython/Modules/gcmodule.c

Lines 170 to 175 in cb72193

    
           gcstate->young.threshold = threshold0; 
        
           if (group_right_1) { 
        
               gcstate->old[0].threshold = threshold1; 
        
           } 
        
           if (group_right_2) { 
        
               gcstate->old[1].threshold = threshold2;

This explains why a cyclic referenced object caused this TSAN report.
The cyclic object couldn't make ref count to zero in scoped call stack, and it increments the _gc_thread_state.alloc_count to LOCAL_ALLOC_COUNT_THRESHOLD.
Then the GC collect triggered in this thread and races with another thread's update of gc_generation.threshold.

Proposed Changes

Add relaxed atomic load/store protection for the gc_generation.threshold setter and getter.

LindaSummer · 2026-05-25T16:24:09Z

Hi @picnixz,

Sorry to bother you.
Could you help take a review on this PR?

Wish you a good day!

pablogsal · 2026-05-25T18:15:01Z

I think this fix can be correct but this doesn't really fix the underlying problem: we are reading unsynced values so this change doesn't give us any guarantees that the values are consistent with a concurrent call to set them. Indeed we could read half of them from one write and another half from another.

I think this is fine but perhaps we need exclusive access.

@nascheme what do you think?

picnixz · 2026-05-25T18:24:57Z

I think so as well. It will only mitigate some cases. AFAIU, the problem is that we have 3 values to read or write but these values can be changed by another thread at any moment right? the correct fix should be to lock the entire GC state when writing or reading.

Note that get_gc_state() already requires the caller to hold the GIL, so we could just hold it for the entirety of each function?

nascheme · 2026-05-26T13:59:41Z

The approach of "sprinkling in" relaxed atomics to silence TSAN warnings is not correct, IMO. They don't ensure ordering. I'd suggest we keep using relaxed atomics for enabled. gc_should_collect() should acquire a mutex, snapshot the fields to locals, release, then evaluate. The mutex is only needed on the slow path of record_allocation. Writers take the mutex: gc.set_threshold() in Modules/gcmodule.c, interp->gc.long_lived_total = state->long_lived_total. The mutex is only held across the snapshot/write, avoiding deadlocks.

LindaSummer · 2026-05-26T14:07:56Z

The approach of "sprinkling in" relaxed atomics to silence TSAN warnings is not correct, IMO. They don't ensure ordering. I'd suggest we keep using relaxed atomics for enabled. gc_should_collect() should acquire a mutex, snapshot the fields to locals, release, then evaluate. The mutex is only needed on the slow path of record_allocation. Writers take the mutex: gc.set_threshold() in Modules/gcmodule.c, interp->gc.long_lived_total = state->long_lived_total. The mutex is only held across the snapshot/write, avoiding deadlocks.

Hi @nascheme , @picnixz and @pablogsal ,

Thanks very much for your review and suggestions!
I will follow the mutex way to ensure the threshold's consistency.

LindaSummer added 4 commits May 24, 2026 22:52

fix the gc_set_threshold_impl race

4f57f14

add unit test

104b4e6

add blurb file

c00b965

use relaxed for load

77f72d9

LindaSummer requested a review from pablogsal as a code owner May 24, 2026 16:26

bedevere-app Bot added the awaiting review label May 24, 2026

bedevere-app Bot mentioned this pull request May 24, 2026

Data race between gc_should_collect and gc_set_threshold_impl #148613

Open

Merge branch 'main' into gc_state_tsan

4cdb1dc

LindaSummer mentioned this pull request May 25, 2026

gh-150411: fix gc_generation.count race #150413

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-148613: Fix race in `gc_set_threshold` and `gc_get_threshold`#150356

gh-148613: Fix race in `gc_set_threshold` and `gc_get_threshold`#150356
LindaSummer wants to merge 5 commits into
python:mainfrom
LindaSummer:gc_state_tsan

LindaSummer commented May 24, 2026

Uh oh!

LindaSummer commented May 25, 2026

Uh oh!

pablogsal commented May 25, 2026

Uh oh!

picnixz commented May 25, 2026 •

edited

Loading

Uh oh!

nascheme commented May 26, 2026

Uh oh!

LindaSummer commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if (gc->alloc_count >= LOCAL_ALLOC_COUNT_THRESHOLD) {
	// TODO: Use Py_ssize_t for the generation count.
	GCState *gcstate = &tstate->interp->gc;
	_Py_atomic_add_int(&gcstate->young.count, (int)gc->alloc_count);
	gc->alloc_count = 0;

	if (gc_should_collect(gcstate) &&

	gc_should_collect(GCState *gcstate)
	{
	int count = _Py_atomic_load_int_relaxed(&gcstate->young.count);
	int threshold = gcstate->young.threshold;
	int gc_enabled = _Py_atomic_load_int_relaxed(&gcstate->enabled);
	if (count <= threshold \|\| threshold == 0 \|\| !gc_enabled) {
	return false;
	}
	if (gcstate->old[0].threshold == 0) {

	gcstate->young.threshold = threshold0;
	if (group_right_1) {
	gcstate->old[0].threshold = threshold1;
	}
	if (group_right_2) {
	gcstate->old[1].threshold = threshold2;

Uh oh!

Conversation

LindaSummer commented May 24, 2026

Issue

Root Cause

Proposed Changes

Uh oh!

LindaSummer commented May 25, 2026

Uh oh!

pablogsal commented May 25, 2026

Uh oh!

picnixz commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nascheme commented May 26, 2026

Uh oh!

LindaSummer commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

picnixz commented May 25, 2026 •

edited

Loading